Skip to content

Conversation

@Parthiba-Hazra
Copy link
Contributor

This PR introduces the orphan checkpoint retention policy, allowing users to control whether orphaned checkpoints are retained or deleted.

  • The new retainOrphan field is added across global, container, pod, and namespace policies.
  • By default, orphan checkpoints are retained, but users can configure this behavior by setting the retainOrphan field to false.

@rst0git
Copy link
Member

rst0git commented Aug 13, 2024

@Parthiba-Hazra Would it be possible to rebase this pull request on the main branch?

- Introduced global, container, pod, and namespace-level
  policies for checkpoint retention, based on storage/size limits.
- Updated CRD definitions to store the storage/size based policies.
- Updated the sample configuration of CheckpointRestoreOperator with
  storage/checkpoint-size based policies

Signed-off-by: Parthiba-Hazra <[email protected]>
- Enhance generate_checkpoint_tar.sh to optionally
  generate tar files larger than 5MB
- Update GitHub Actions workflow to test storage quota
  garbage collection policies

Signed-off-by: Parthiba-Hazra <[email protected]>
- To implement the orphan checkpoint retention policy, the manager
  requires permissions to watch and get resources. This allows the
  manager pod to watch the relevant resources and retrieve the
  necessary resource information when applying the policies.

Signed-off-by: Parthiba-Hazra <[email protected]>
- Added support for orphan retention policies at the global,
  namespace, pod, and container levels.
- Introduced the `retainOrphan` field in each policy type to
  control the retention of orphan checkpoints.
- Updated the policy application logic to delete all orphan
  checkpoints when `retainOrphan` is set to false.
- Implemented a PodWatcher to monitor pod deletions and apply
  policies immediately when a resource is deleted.

Signed-off-by: Parthiba-Hazra <[email protected]>
- Add `test_orphan_retention_policy` test and update
  GitHub Actions workflow to test orphan retention policy

Signed-off-by: Parthiba-Hazra <[email protected]>
- `checkpointDirectory`: Specifies the directory where checkpoints are stored.
- `applyPoliciesImmediately`: If set to `true`, the policies are applied immediately. If `false` (default value), they are applied after new checkpoint creation.
- `globalPolicy`: Defines global checkpoint retention limits.
- `retainOrphan`: If set to `true` (default), orphan checkpoints (checkpoints whose associated resources have been deleted) will be retained. If set to `false`, orphan checkpoints will be automatically deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `retainOrphan`: If set to `true` (default), orphan checkpoints (checkpoints whose associated resources have been deleted) will be retained. If set to `false`, orphan checkpoints will be automatically deleted.
- `retainOrphan`: If set to `true` (default), orphan checkpoints (checkpoints whose associated resources have been deleted) will be retained. If set to `false`, orphan checkpoints will be automatically deleted. This is particularly useful for transient checkpoints used to recover from errors by replacing 'container restart' with 'container restore'.

@rst0git
Copy link
Member

rst0git commented May 8, 2025

Closing in favour of #60

@rst0git rst0git closed this May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants